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METHOD AND SYSTEM FOR A MULTI -CHANNEL ADD- COMPARE -SELECT UNIT 

BACKGROUND OF THE INVENTION 
Field of the Invention 

[0001] This invention relates to the field of error 
correction, and more particularly relates to a method and 
system for a multi-channel add-compare-select unit. 

Description of the Related Art 

[0002] The "Add-Compare-Select" (ACS) circuit is a well- 
known building block for a variety of circuits, including 
circuits such as the Forward Error Correction (FEC) circuit. 
The ACS circuit (ACS unit) forms the basis of most 
convolutional/trellis decoders, as well as comparable 
circuits. The basic function of an ACS unit is to add its 
inputs together in some combination, compare the results of 
those additions, and in doing so, select one of those results 
as the ACS unit's output. 

[0003] The function of an ACS unit in a trellis decoder 
can be used as an example. In the simplest terms, an ACS 
unit implemented in a trellis decoder takes the cost of each 
branch into a state within the trellis, and adds the total 
cost of the source state for that branch. This results in 
two new state costs (e.g., A and B) . These two new state 
costs are then compared to one another, and the smaller cost 
of the two is selected (e.g., B<A?) . This selected cost is 
output as the new cost for the current state under 
consideration. An FEC design may contain hundreds of such 
ACS units. The output of each of these ACS units feeds back 
into itself, or to the inputs of another of the ACS units. 
The ACS units' operation needs to execute in a single symbol 
period in order to successfully process the incoming data 
stream. Thus, the ACS unit typically forms the critical 
timing path within a design employing such circuits (e.g., an 
FEC design) . 

[0004] Fig. 1 is an example of such an "add-compare- 
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select" (ACS) unit 100 of the prior art. ACS unit 100 
includes two adder units (depicted in Fig. 1 as an adder unit 
110 and an adder unit 120) . Adder unit 110 takes as its 
input a previous state A cost signal 112 and a branch cost A 
signal 114. Similarly, adder unit 120 takes as its inputs a s 
previous state B cost signal 122 and a branch cost B signal 
124. The outputs of adder units 110 and 120 are fed into a 
selection circuit 130 and a compare unit 140. The output of 
selection circuit 130 is stored in a register 150, and upon 
such storage, appears at an output of register 150 as a new 
cost signal 155. 

[0005] As noted, the output of adder units 110 and 120 are 
also provided to compare unit 140. The output of adder unit 
110 is received by compare unit 140 at an input A. 
Similarly, the output of adder unit 120 is received by 
compare unit 140 at an input B. Compare unit 140 uses this 
information to make a determination as to whether the output 
of adder unit 110 is less than the output of adder unit 120 
(depicted in Fig. 1 as an output B<A) , and in so doing, for 
example, determines the smaller of the two new state costs 
output by adder units 110 and 120. 

[0006] The output of compare unit 140 is provided to 
selection unit 130 as a select signal 145. Select signal 
145 causes selection unit 13 0 to select either the output of 
adder unit 110 or the output of adder unit 120, depending on 
the outcome of the determination just described. Select 
signal 145 is also provided to a register 160, which stores 
select signal 145. Upon storage, this information appears at 
the output of register 160 as a path selection signal 165. 
[0007] Similar to the standard ACS unit, the "Add-Compare- 
Select-Off set" (ACSO) unit operates in a similar manner, with 
the exception that in an ACSO unit, the larger sum generated 
is chosen, and a small offset added to that result. Such an 
architecture is used, for example, in the design of turbo- 
convolutional decoders, which are an implementation of an 
advanced Forward Error Correction (FEC) algorithm commonly 
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used in wireless data communications. 
[0008] Fig. 2 is a block diagram illustrating an 
architecture of an ACSO unit 200 according to the prior art. 
As will be appreciated, the architecture of ACSO unit 2 00 is 
similar in many respects to that of ACS unit 100 of Fig. 1. 
In this case, an adder unit 210 takes as its input a previous 
state A likelihood (PSAL) signal 212 and a branch likelihood 
A (BLA) signal 214, producing a sum of these two signals. 
Similarly, an adder unit 220 generates the sum of a previous 
state B likelihood (PSBL) signal 222 and a branch likelihood 
B (BLB) signal 224. 

[0009] Also as before, a selection unit 230 receives these 
sums (i.e., the outputs of adder units 210 and 220), and 
selects one of these sums under the control of a compare unit 
240. Compare unit 240 receives the sum generated by adder 
unit 210 at an input A, and receives the sum generated by 
adder unit 220 at an input B. Compare unit 240 controls the 
selection made by selection unit 230 based on a comparison of 
the inputs A and B. Thus, the select signal generated by 
compare unit 240 is based on a determination as to whether 
the input B is less than the input A (depicted in Fig. 2 as 
an output B<A) . 

[0010] However, in ACSO unit 200, compare unit 240 is also 
configured to generate the difference between input A and 
input B (depicted in Fig. 2 as an output A-B) . This 
difference is provided as an address to a read-only memory 
(ROM) 250, or other such storage device. ROM 250 stores 
offsets that provide the offsets necessary to compute the new 
likelihood ratio. The outputs of selection unit 230 and ROM 
250 are summed by an adder unit 260 to generate a new 
likelihood value. This new likelihood value is clocked into 
a register 270, and appears at an output of register 270 as a 
new likelihood signal 280. 

[0011] As will be appreciated, it is not uncommon in 
today's communication systems to have multiple incoming 
channels. In such a situation, each of these channels needs 
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to be decoded. This has historically been addressed by- 
replicating the decoder circuit once for each additional 
channel. As will be appreciated, this replication increases 
the cost of such circuitry, as well as the area consumed. An 
example of such an architecture is now given. 
[0012] Fig. 3 is a block diagram depicting a multiple 
channel architecture 300 of the prior art. Multiple channel 
architecture 3 00 includes a number of ACS units (depicted in 
Fig. 3 as ACS units 310(1)-(N), where N is an integer). Each 
of ACS units 310(1)-(N) receives an input signal (depicted in 
Fig. 3 as input signals 320(1)-(N)), representing one of a 
number of channels of cost data (shown in Fig. 3 as channels 
1-N cost data). Correspondingly, each of ACS units 310(1)- 
(N) generates an output signal (depicted in Fig. 3 as output . 
signals 330(1)-(N)), representing one of a number of 
channels' new cost data and path selections (shown in Fig. 3 
as channels 1-N new cost data and path selections) . As will 
be appreciated, each channel is presented to one of ACS units 
310(1) - (N) , which performs the requisite decoding, according 
to the coding/decoding (codec) scheme employed. As will also 
be appreciated, a multi-channel turbo -convolutional scheme 
can be implemented by replacing each of ACS units 310(1)-(N) 
with an ACSO unit. 

[0013] Such an architecture is not without its 
limitations, however. While providing for the processing of 
multiple channels, the usefulness of such an approach is 
limited by the resources consumed thereby. This is 
particularly true of the integrated circuit area required by 
such an architecture. Moreover, while throughput is much 
improved over a single channel implementation as a result of 
multiple channels being processed, each ACS unit is still 
limited to processing a single symbol per symbol time, 
limiting the architecture's overall throughput. On a per- 
channel basis, the architecture's throughput is thus no 
different than that of the single channel implementation. 
Frequency of operation is similarly restricted, which may 
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result in a substantial portion of a hardware 
implementation's bandwidth going unused. 

[0014] What is therefore desired is an ACS/ACSO circuit 
that is capable of processing multiple data channels. 
Preferably, such an ACS/ACSO circuit should provide such 
capabilities with a minimal increase in circuit size as 
compared to a standard ACS/ACSO circuit. It is also 
preferable that such a circuit be capable of operating at 
clock speeds significantly higher than a standard ACS/ACSO 
circuit . 

SUMMARY 

[0015] In one embodiment, a circuit for a multi-channel 
add-compare-select unit is disclosed. The circuit includes a 
compare unit and a datapath. The datapath is coupled to the 
compare unit, and includes a number of adder units, a 
selection unit (which is coupled to the adder units) , and a 
number of clocked , storage stages. 

[0016] In another embodiment, a method is disclosed. The 
method includes producing a first sum, producing a second sum 
and producing a selected sum by selecting one of the first 
sum and the second sum. The first sum is produced from first 
data received at a first input of a number of inputs and 
second data received at a second input of the inputs. The a 
second sum is produced from third data received at a third 
input of the inputs and fourth data received at a fourth 
input of the inputs. Once produced, the selected sum is 
provided at an output. The method also includes performing a 
number of clocked storage operations on a signal, is conveyed 
on a signal path between a one of the inputs and the output. . 
[0017] The foregoing is a summary and thus contains, by 
necessity, simplifications, generalizations and omissions of 
detail; consequently , those skilled in the art will 
appreciate that the summary is illustrative only and is not 
intended to be in any way limiting. Other aspects, inventive 
features, and advantages of the present invention, as defined 
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solely by the claims, will become apparent in the non- 
limiting detailed description set forth below. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0018] The present invention may be better understood, and 
numerous objects, features, and advantages made apparent to 
those skilled in the art by referencing the accompanying 
drawings . 

[0019] Fig. 1 is a block diagram illustrating an example 
of an add-compare-select unit of the prior art. 
[0020] Fig. 2 is a block diagram illustrating an example 
of a ROM-based add-compare-select-off set unit of the prior 
art . 

[0021] Fig. 3 is a block diagram illustrating a multiple 
channel architecture of the prior art. 

[0022] Fig. 4 is a block diagram illustrating an example 
of an add-compare-select unit according to embodiments of the 
present invention . 

[0023] Fig. 5 is a flow diagram illustrating an example of 
the operation of the add-compare-select unit of Fig. 4. 
[0024] Fig. 6 is a timing diagram illustrating an example 
of the operation of the add-compare-select unit of Fig. 4. 
[0025] Fig. 7 is a block diagram illustrating an example 
of another add-compare-select unit according to embodiments 
of the present invention. 

[0026] Fig. 8 is a timing diagram illustrating an example 
of the operation of the add-compare-select unit of Fig. 7. 
[0027] Fig. 9 is a block diagram illustrating an example 
of an add-compare-select-off set unit according to embodiments 
of the present invention. 

[0028] The use of the same reference symbols in different 1 
drawings indicates similar or identical items. 

DETAILED DESCRIPTION OF THE INVENTION 

[0029] The following is intended to provide a detailed 
description of an example of the invention and should not be 
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taken to be limiting of the invention itself. Rather, any 
number of variations may fall within the scope of the 
invention which is defined in the claims following the 
description. 

Introduction 

[0030] The invention provides a method and system for a 
multi-channel ACS (or ACSO) unit. A multiple-channel ACS 
(MCACS) or multiple-channel ACSO (MCACSO) unit of the present 
invention provides a mechanism for efficiently processing 
multiple-channel data at a high throughput rate. 
[0031] An MCACS/MCACSO of the present invention is able to 
process multiple channels through its use of an interlaced 
multiple-channel data signal and pipelining the processing 
thereof. Such an MCACS/MCACSO unit employs multiple clocked 
storage stages to store the intermediate results of the 
operations performed by the MCACS/MCACSO unit. These clocked 
storage stages can be included at any point within the 
datapath from the MCACS/MCACSO unit's input to its output, 
including within the elements of the MCACS/MCACSO unit, as 
well as between such elements. As will be appreciated, the 
signals flowing through the MCACS/MCACSO unit need to be 
stored (or gated) at comparable points within the 
MCACS/MCACSO unit, to ensure that these signals are in 
synchrony (i.e., arrive at their destination within the 
MCACS/MCACSO unit at the appropriate point in time) . 
[0032] The present invention provides a number of 
advantages. Among them is the ability to operate on multiple 
data channels, while not consuming significantly more 
resources in implementation. Another advantage is the 
ability to operate at a higher frequency, and so a higher 
throughput rate, than standard techniques. 

Example A CS Circuits and Operation Thereof 
[0033] Fig. 4 is a block diagram of an example of a 
multiple channel ACS (MCACS) unit 400 according to the 
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present invention. MCACS unit 400 includes an adder unit 410 
that receives a previous state A cost (PSAC) signal 412 and a 
branch cost A (BCA) signal 414, and generates a sum of these 
two signals. Similarly, an adder unit 420 receives a 
previous state B cost (PSBC) signal 422 and a branch cost B 
(BCB) signal 424, and generates a sum of these two signals. 
This occurs in the manner of the additions performed by ACS 
unit 100 of Fig. 1. It will be appreciated, however, that in 
certain circumstances, adder units 410 and 420 actually 
implement other operations, such as subtraction (in effect, 
the addition of one operand with the negative of another 
operand) , multiplication (in effect, repeated addition) , 
logical shifting (which can effect multiplication or 
division) and the like. Such alternatives are contemplated 
by the structures described, and the elements claimed, 
herein. 

[0034] However, in the example depicted in Fig. 4, the 
sums generated by adder units 410 and 42 0 are then fed into 
clocked storage stages. Such clock storage stages are 
depicted in Fig. 4 as registers 425 and 426. As will be 
appreciated by those of skill in the art, such clock storage 
stages can be implemented in a variety of ways, and the use 
of registers in Fig. 4 (as well as other figures herein) is 
merely an example of such alternatives, which includes 
registers, shift registers, aggregations of flip-flops, RAM- 
based delay lines and other such constructs, and is provided 
to allow for familiarity in discussing such elements. It 
will also be appreciated that, in a more general sense, such 
clock storage stages can be aggregated, and viewed as a slice 
or plane through the data and control paths of the given 
MCACS/MCACSO unit. Such a plane intersects the signal paths 
at a point at which those signals passing therethrough can be 
stored/gated while having their synchrony maintained (though 
their progress through the MCACS/MCACSO unit is momentarily 
interrupted) . 

[0035] Registers 425 and 42 6 clock in the sums generated 
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by adders units 410 and 420, respectively, providing those 
values on their respective outputs and maintaining them for a 
predetermined period of time. In doing so, the use of 
registers 425 and 426 allows the inputs to adders units 410 
and 420 to change (e.g., as would occur upon the receipt of 
the next channel's data in a system of the present invention) 
without resulting in changing the signals presented to the 
next set of registers. 

[0036] The outputs of registers 425 and 426 are then 
provided to those registers, registers 427 and 428, 
respectively. As with registers 425 and 426, registers 427 
and 428 clock in a given channel's data, allowing the 
preceding stage to accept the subsequent channel's data. 
Registers 427 and 428 provide their outputs to a selection 
unit 430. The outputs of registers 425 and 426 are also 
provided to a compare unit 440. Compare unit 440 receives 
the output of register 425 (i.e. the sum generated by adder 
unit 410) at an input A. Similarly, compare unit 440 
receives the output of register 426 (i.e., the sum generated 
by adder unit 420) at an input B. Compare unit 440 compares 
input A with input B, and makes a determination as to whether 
input B is less than input A (i.e., B<A) . The results of 
this comparison are used as a select signal to control which 
sum is selected by selection unit 430. 

[0037] It is also advantageous to break up the path from 
the outputs of registers 425 and 42 6, through compare unit 
440, to selection unit 430. In the example shown in Fig. 4, 
this is accomplished by coupling a register 445 between 
compare unit 440 and selection unit 430 (i.e., clocking the 
aforementioned select signal), and so decoupling the time 
required for the signals to propagate through compare unit 
440 and through selection unit 430. This arrangement thus 
reduces the delay encountered between clocked storage stages 
by signals traversing this signal path (by including another 
clocked storage stage in that path) , and so provides an 
increase in processing speed. However, the sums 
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corresponding to this select signal (from registers 425 and 
426) also need to be clocked through a clocked storage stage 
in order to remain in synchrony with the select signal. 
Thus, rather than providing the sums from registers 425 and 
426 directly to selection unit 43 0, the sums are clocked into 
registers 427 and 428, in order to maintain the signals in 
synchrony. Thus, registers 427 and 428 are coupled between 
registers 425 and 426, respectively, and selection unit 430, 
in order to allow the sums to be provided to selection unit 
43 0 at the appropriate time. As will be appreciated, 
registers 427, 428 and 445 can, in fact, be included at 
appropriate points within selection unit 43 0, as one or more 
clock storage stages therein. 

[0038] Once this selection signal is provided to selection 
unit 43 0 and an appropriate one of the sums presented by 
registers 427 and 428 is selected, the selected sum is 
provided to a register 450 and consequently output as a new 
cost signal 455. In a similar manner, the select signal 
generated by compare unit 440 and stored by register 445 is 
clocked into a register 460, for subsequent presentation as a 
path selection signal 465. 

[0039] As will be appreciated, adder units 410 and 420, 
registers 425-428, selection unit 430 and register 450 make 
up a datapath 470 of MCACS unit 400, while compare unit 440 
and registers 445 and 460 make up a control path 480 of MCACS 
unit 400. In general, clocked storage stages can be inserted 
into datapath 470 at any point between its inputs and its 
outputs. This includes inserting clocked storage stages 
within the outer units, selection unit or in between such 
elements . This also holds true for the insertion of clocked 
storage stages within control path 480 (e.g., a clocked 
storage stage can be inserted within compare unit 440, for 
example) . 

[0040] However, in the manner previously noted, for each 
clocked storage stage inserted after the point at which 
control path 480 is coupled to datapath 470, a clocked 
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storage stage inserted in datapath 470 will mandate the 
insertion of a corresponding clocked storage stage within 
control path 480. Ultimately, the need is to provide a given 
channel's data to the various elements of datapath 470 and 
control path 480, such that data and control signals are 
present and stable at corresponding points in datapath 470 
and control path 480, and such that the outputs generated 
thereby are good for a period of time sufficient to allow 
MCACS unit 400 to operate properly. 

[0041] As will be appreciated, other components of a FEC 
decoder employing MCACS units of the present invention should 
be modified to allow processing of the interlaced channel 
data from the MCACS unit. Typically, this simply involves 
adding an extra pipeline register for each channel in the FEC 
decoder. As will be appreciated, in memory-based FPGA's, 
such constructs can be created without consuming 
significantly more area of the FPGA. 

[0042] Regarding the signals described herein, those 
skilled in the art will recognize that a signal may be 
directly transmitted from a first block to a second block, or 
a signal may be modified (e.g., amplified, attenuated, 
delayed, buffered, inverted, filtered or otherwise modified) 
between the blocks. Although the signals of the above 
described embodiment are characterized as transmitted from 
one block to the next, other embodiments of the present 
invention may include modified signals in place of such 
directly transmitted signals as long as the informational 
and/or functional aspect of the signal is transmitted between 
blocks. To some extent, a signal input at a second block may 
be conceptualized as a second signal derived from a first 
signal output from a first block due to physical limitations 
of the circuitry involved (e.g., there will inevitably be 
some attenuation and delay) . Therefore, as used herein, a 
second signal derived from a first signal includes the first 
signal or any modifications to the first signal, whether due 
to circuit limitations or due to passage through other 
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circuit elements which do not change the informational and/or 
final functional aspect of the first signal. 
[0043] The foregoing described embodiment wherein the 
different components are contained within different other 
components (e.g., the various elements shown as components of 
MCACS 400) . It is to be understood that such depicted 
architectures are merely examples, and that in fact many 
other architectures can be implemented which achieve the same 
functionality. In an abstract, but still definite sense, any 
arrangement of components to achieve the same functionality 
is effectively "associated" such that the desired 
functionality is achieved. Hence, any two components herein 
combined to achieve a particular functionality can be seen as 
"associated with" each other such that the desired 
functionality is achieved, irrespective of architectures or 
intermediate components. Likewise, any two components so 
associated can also be viewed as being " [operably] 
connected", or "[operably] coupled", to each other to achieve 
the desired functionality. 

[0044] Fig. 5 is a flow diagram illustrating the operation 
of MCACS unit 400. The process begins with the addition of 
previous costs and branch states for each input (input A and 
input B, in Fig. 4) of the given channel (i.e., the channel 
currently being received) (step 500) . Next, the given 
channel's data is clocked into its respective register (e.g., 
registers 425 and 426) (step 510), or other clocked storage 
stage (s) as may be used. With this data being presented on 
the output of registers 425 and 426, compare unit 440 is able 
to perform the requisite comparison (step 520) . 
[0045] Next, the outputs of registers 425 and 42 6, as well 
as the output of compare unit 440, are clocked into registers 
427, 428, and 445, respectively (step 530). With the sums 
generated by adder units 410 and 420 now present at the 
outputs of registers 427 and 428, and the results of the 
comparison performed by compare unit 440 presented at the 
output of register 445, selection unit 43 0 is now able to 
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select one of the sums based on the select signal it has 
received (step 540) . With the selection now made, the 
selected sum is clocked into register 450 and so presented as 
new cost signal 455 (step 550) . Similarly, the result of the 
comparison performed by comparison unit 440 is clocked in 
register 460, and presented as path selection signal 465 
(step 550) . 

[0046] Control circuitry (not shown) then makes a 
determination as to whether the decoding being performed is 
complete (i.e., whether all data from each of the channels 
has completed processing) (step 560) . If decoding is 
complete, the process is at an end. If further processing is 
required, the process loops back to the beginning, and the 
channel's data is processed further. 
[0047] As noted, Fig. 5 depicts a flow diagram 
illustrating a process according to an embodiment of the 
present invention. It is appreciated that operations 
discussed herein may consist of directly entered commands by 
a computer system user or by steps executed by application 
specific hardware modules. The functionality of steps 
referred to herein may correspond to the functionality of 
modules or portions of modules . 

[0048] The operations referred to herein may be modules or 
portions of modules (e.g., software, firmware or hardware 
modules) . For example, while the described embodiment 
includes field-programmable application specific hardware 
modules, the various example modules may be software modules 
and/or include manually entered user commands. The software 
modules discussed herein may include script, batch or other 
executable files, or combinations and/or portions of such 
files. The software modules may include a computer program 
or subroutines thereof encoded on computer-readable media. 
[0049] Additionally, those skilled in the art will 
recognize that the boundaries between modules are merely 
illustrative and alternative embodiments may merge modules or 
impose an alternative decomposition of functionality of 
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modules. For example, the modules discussed herein may be 
decomposed into submodules to be executed as multiple 
computer processes, and, optionally, on multiple computers. 
Moreover, alternative embodiments may combine multiple 
instances of a particular module or submodule. Furthermore, 
those skilled in the art will recognize that the operations 
described in example embodiment are for illustration only. 
Operations may be combined or the functionality of the 
operations may be distributed in additional operations in 
accordance with the invention. 

[0050] Alternatively, such actions may be embodied in the 
structure of circuitry that implements such functionality, 
such as the micro-code of a complex instruction set computer 
(CISC) , firmware programmed into programmable or 
erasable/programmable devices, the configuration of a field- 
programmable gate array (FPGA) , the design of a gate array or 
full-custom application-specific integrated circuit (ASIC) , 
or the like. 

[0051] The example depicted in Fig. 4 processes three 
channels of data concurrently. The operation of such an 
MCACS relies on the fact that the incoming data for the three 
channels are interlaced. For example, in the scenario where 
three data channels are used, data is provided to the MCACS 
unit in the sequence described below in connection with Fig. 
6. 

[0052] Fig. 6 is a timing diagram illustrating an example 
of the operation of an MCACS unit such as MCACS unit 400. 
Fig. 6 shows data at the inputs and outputs of such an MCACS 
unit. It will be appreciated that the sequential 
relationship depicted in Fig. 6 with regard to DATA_IN and 
DATA_OUT would also apply to the primary inputs and outputs 
of an overall convolutional decoder design employing such an 
MCACS unit. In that case, however, a much greater delay 
would be observed between a channel's data being input to 
such a decoder and the corresponding decoded data being 
output therefrom. 
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[0053] As can be seen, DATA_IN (the input data) includes 
three channels of data (depicted as DIl, DI2 and DI3), each 
interlaced in one another in a sequential fashion. It will 
be appreciated that the input data depicted in Fig. 6 (and 
subsequently, in Fig. 8) is intended to represent a vector 
of inputs, as the actual number of inputs depends on the type 
of error correction algorithm implemented. The same 
waveforms and relationships apply to each individual MCACS 
unit. The data for channels one, two, and three(i.e., DIl, 
DI2 and DI3), are fed sequentially into the MCACS unit in an 
interlaced order. As will also be appreciated, because the 
data inputs are interlaced in this fashion, the data output 
by an MCACS unit of the present invention will be similarly 
interlaced (as shown in Fig. 6 by DAT A_OUT ) . 
[0054] Data for the channels is thus received in 
sequential order, and proceeds through the given MCACS unit 
in that order. At any given time, then, clocked storage 
stages at various points within the MCACS unit each hold the 
data for a given channel (e.g., the clocked storage stages of 
MCACS 400 store the data of channels one, two or three) . 
After the next clock edge, the data for each channel proceeds 
to the next clocked storage stage (s). As will be apparent, 
if the clocked storage stage is the last such stage in the 
MCACS unit, the channel's data is available at the MCACS 
unit's output, and so would either be clocked into the same 
MCACS unit, another MCACS unit, or output from the decoder. 
This can be seen in Fig. 6, where the use of three clocked 
storage stages results in a given channel's data being 
available on the MCACS unit's output every three clock 
cycles. It will also be noted that the channel's data is 
thus also available at that time (and, as would be expected, 
in time) to be clocked into the next MCACS unit (whether the 
original MCACS unit, or a different one) , or output from the 
decoder. 

[0055] It will be noted that three clock cycles are 
required in this example in order to compute the new cost 
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values for all three channels. However, the clock can be run 
significantly faster than would otherwise be possible, 
because the feedback loop (and so the time required to get 
from one stage to another) has been divided into three 
portions. Thus, a single MCACS unit is capable of performing 
the same operations as multiple standard ACS units run 
independently on the given number of channels (in this 
example, three channels) . 

[0056] An MCACS unit (or MCACSO unit) of the present 
invention is particularly amenable to implementation using 
memory-based field-programmable gate arrays (FPGA's), because 
the MCACS unit is not significantly larger than the 
implementation of a standard ACS unit. This is because the 
extra registers employed in an MCACS unit are absorbed by the 
spare register capacity with the FPGA's configurable logic 
blocks (CLB's) implementing the adders and other logic of the 
MCACS . 

[0057] It will be appreciated that, in implementations 
according to the present invention, the combinatorial path of 
the standard ACS unit is split into multiple portions in the 
MCACS unit (e.g., being split into three portions by the 
addition of two clocked storage stages, in the example 
presented previously) . Breaking the standard ACS unit into 
stages permits the MCACS unit to operate at significantly 
higher clock frequencies than a standard ACS unit. However, 
the total throughput through an MCACS unit will typically be 
somewhat less than a parallel design (e.g., that of Fig. 3). 
As will be appreciated, of course, an MCACS unit provides 
significantly higher throughput than a standard ACS unit, and 
does so in significantly less area than a parallel design. 
As will be appreciated, the number of data channels that an 
MCACS unit is configured to process is equal to the number of 
clocked storage stages in the datapath of the MCACS unit. 

Another Example ACS Circuit and Operation Thereof 

[0058] If a number of channels other than three channels 
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is required, then the number of clocked storage stages 
required will be different, as well. For example, if four or 
more channels are to be processed, a larger number of clocked 
storage stages is employed, and arranged at some point within 
the datapath of the MCACS unit. As will be appreciated, 
these clocked storage stages can be inserted between elements 
(e.g., between the adder units and selection unit), within 
one of these elements (e.g., within the adder units or 
selection unit), or at some other point. Moreover, if 
multiple clocked storage stages are employed, constructs such 
as shift registers or equivalent structures can be employed. 
Again, in the case of memory-based FPGA's, such banks of 
clocked storage stages can be implemented using a shift 
register look-up table (LUT) (SRL) , allowing an efficient 
implementation of the MCACS unit in such an FPGA. 
[0059] Fig. 7 is a block diagram of an example of a 
multiple channel ACS (MCACS) unit 700 according to the 
present invention which is configured to support six 
channels. As noted, clocked storage stages can be inserted 
at any appropriate point in the data and control paths of an 
MCACS unit. MCACS unit 700, for example, includes registers . 
at its inputs (depicted in Fig. 7 as registers 702, 704, 706 
and 708), which provide their outputs to an adder unit 710 
and an adder unit 720. Adder unit 710 receives, via 
registers 702 and 704, a PSAC signal 712 and a BCA signal 
714, respectively, and generates a sum of these two signals. 
Similarly, adder unit 720 receives, via registers 706 and 
7 08, a PSBC signal 722 and a BCB signal 724, and generates a 
sum of these two signals. This occurs in the manner of the 
additions performed by MCACS unit 400 of Fig. 4. 
[0060] In a manner similar to Fig. 4, the sums generated 
by adder units 710 and 72 0 are then fed into clocked storage 
stages. However, such clock storage stages are depicted in 
Fig. 7 as shift registers 725 and 726 (rather than 
registers) . Shift registers 725 and 726 are three bits in 
length in this example (although shift registers of other 
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lengths could be employed, so long as an appropriate number 
of clock storage stages is maintained within MCACS unit 700) . 
Shift registers 725 and 726 clock in the sums generated by- 
adder units 710 and 720, respectively, and shift those values 
through their structures. Ultimately, after three clock 
cycles, shift registers 725 and 726 provide these values on 
their respective outputs and maintain them there for a 
predetermined period of time. 

[0061] The outputs of shift registers 725 and 726 are then 
provided to registers 727 and 728, respectively. As with 
registers 725 and 726, registers 727 and 728 clock in a given 
channel's data, allowing the preceding stage to accept a . 
subsequent channel's data at the preceding stage's inputs. 
Registers 727 and 728 provide their outputs to a selection 
unit 73 0. The outputs of registers 725 and 726 are also 
provided to a compare unit 740. Compare unit 740 receives 
the output of register 725 (i.e. the sum generated by adder 
unit 710) at an input A. Similarly, compare unit 740 
receives the output of register 72 6 (i.e., the sum generated 
by adder unit 720) at an input B. Compare unit 740 compares 
input A with input B, and makes a determination as to whether 
input B is less than input A (i.e., B<A) . The results of 
this comparison are used as a select signal to control which 
sum is selected by selection unit 730. 

[0062] However, the sums corresponding to this select 
signal will be clocked into registers 727 and 728, and so not 
provided directly to selection unit 730. That being the 
case, a register is required, in this example, between 
compare unit 740 and the select input of selection unit 73 0, 
in order to maintain the signals in synchrony. Thus, a 
register 745 is coupled between compare unit 740 and 
selection unit 73 0, allowing the select signal generated by 
compare unit 740 to be provided to selection unit 73 0 at the 
appropriate time. 

[0063] Once this selection signal is provided to selection 
unit 73 0 and an appropriate one of the sums presented by 
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registers 727 and 728 is selected, the selected sum is 
provided to a register 750 and consequently output as a new 
cost signal 755. In a similar manner, the select signal 
generated by compare unit 740 and stored by register 745 is 
clocked into a register 760, for subsequent presentation as a 
path selection signal 765. 

[0064] As will be appreciated, adder units 710 and 720, 
registers 725-728, selection unit 730 and register 750 make 
up a datapath 77 0 of MCACS unit 700, while compare unit 740 
and registers 745 and 760 make up a control path 780 of MCACS 
unit 7 00. As noted earlier, in general, clocked storage 
stages can be inserted into datapath 770 at any point between 
its inputs and its outputs. This includes inserting clocked 
storage stages within the outer units, selection unit or in 
between such elements. This also holds true for the 
insertion of clocked storage stages within control path 780 
(e.g., a clocked storage stage can be inserted within compare 
unit 740, for example) . 

[0065] However, in the manner previously noted, for each 
clocked storage stage inserted after the point at which 
control path 780 is coupled to datapath 770, a clocked 
storage stage inserted in datapath 770 will mandate the 
insertion of a corresponding clocked storage stage within 
control path 780. Ultimately, the need is to provide a given 
channel's data to the various elements of datapath 770 and 
control path 780, such that data and control signals are 
present and stable at corresponding points in datapath 770 
and control path 780, and such that the outputs generated 
thereby are good for a period of time sufficient to allow 
MCACS unit 700 to operate properly. Such operation is 
exemplified in the discussion below of Fig. 8. 
[0066] Fig. 8 is a timing diagram illustrating an example 
of the operation of an MCACS unit such as MCACS unit 700. 
Fig. 8 shows data at the primary inputs and outputs of the 
convolutional decoder design of which the MCACS unit is a 
part, in the manner of the timing presented in Fig. 8. As 
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can be seen, DATA_IN (the input data) includes six channels 
of data (depicted as DIl, DI2, DI3, DI4, DI5 and DI6), each 
interlaced in one another in a sequential fashion. The same 
waveforms and relationships apply to each individual MCACS 
unit. The data for channels one, two, three, four, five and 
six (i.e., DIl, DI2, DI3, DI4, DI5 and DI6), are fed 
sequentially into the MCACS unit in an interlaced order. 
[0067] In MCACS unit 700, three channels of data are 
stored (being shifted through) shift registers 725 and 726. 
It will be appreciated that shift registers, among other 
alternatives, can be used to add, within the limits of 
physical capacity, any number of channels to a given MCACS 
(or MCACS) . As will also be appreciated, because the data 
inputs are interlaced in this fashion, the data output by an 
MCACS unit of the present invention will be similarly 
interlaced (as shown in Fig. 8 by DAT A_OUT ) . 

Example ACSO Circuit and Operation Thereof 

[0068] It will be appreciated that a technique according 
to the present invention can be applied in the design of 
turbo-convolutional decoders, which employ ACSO units, as 
noted. 

[0069] Fig. 9 is a block diagram illustrating an 
architecture of an MCACSO unit 900 according to the present 
invention. As will be appreciated, the architecture of 
MCACSO unit 900 is similar in many respects to that of MCACS 
unit 400 of Fig. 4. As before, an adder unit 910 takes as 
its input a PSAC signal 912 and a BCA signal 914, producing a 
sum of these two signals. Similarly, an adder unit 920 
generates the sum of a PSBC signal 922 and a BLB signal 924. 
[0070] Also as before, these sums are temporarily stored 
in registers (registers 925 and 926) . A selection unit 930 
receives these sums from registers 925 and 926. The output 
of is stored by a register 935. Selection unit 930 selects 
one of these sums under the control of a compare unit 940, 
which also receives the sums from registers 925 and 926. 
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Compare unit 940 receives the sum previously generated by 
adder unit 910 at an input A, and receives the sum previously 
generated by adder unit 920 at an input B. 
[0071] Compare unit 940 controls the selection made by 
selection unit 93 0 based on a comparison of the inputs A and 
B. Thus, in one embodiment, the select signal generated by 
compare unit 940 is based on a determination as to whether 
the input B is less than the input A (depicted in Fig. 9 as 
an output B<A) . Compare unit 940 is also configured to 
generate the difference between input A and input B (depicted 
in Fig. 9 as an output A-B) . This difference is provided to 
a read-only memory (ROM) 950, or other such storage device. 
ROM 950 stores offsets that provide the offsets necessary to 
compute the new likelihood ratio. The output of ROM 950 is 
stored by a register 955. 

[0072] The output of register 935 (the stored output of 
selection unit 93 0) and register 955 are summed by an adder 
unit 960 to generate a new likelihood value. This new 
likelihood value is clocked into a register 970, and appears 
at an output of register 970 as a new likelihood signal 980. 
[0073] As will be appreciated, adder units 910 and 920, 
registers 925 and 926, selection unit 930, register 935, 
adder unit 960 and register 970 make up a datapath 99 0 of 
MCACSO unit 900, while compare unit 940, ROM 950 and register 
955 make up a control path 995 of MCACSO unit 900. As noted 
with regard to the MCACS units discussed earlier, in general, 
clocked storage stages can be inserted into datapath 990 at 
any point between its inputs and its outputs. This includes 
inserting clocked storage stages within the outer units, 
selection unit or in between such elements. This also holds 
true for the insertion of clocked storage stages within 
control path 995 (e.g., a clocked storage stage can be 
inserted within compare unit 940, for example) . 
[0074] However, in the manner previously noted, for each 
clocked storage stage inserted after the point at which 
control path 995 is coupled to datapath 990, a clocked 
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storage stage inserted in datapath 990 will mandate the 
insertion of a corresponding clocked storage stage within 
control path 995. Ultimately, the need is to provide a given 
channel's data to the various elements of datapath 990 and 
control path 995, such that data and control signals are 
present and stable at corresponding points in datapath 990 
and control path 995, and such that the outputs generated 
thereby are good for a period of time sufficient to allow 
MCACSO unit 900 to operate properly. 

[0075] While particular embodiments of the present 
invention have been shown and described, it will be obvious 
to those skilled in the art that, based upon the teachings 
herein, changes and modifications may be made without 
departing from this invention and its broader aspects and, 
therefore, the appended claims are to encompass within their 
scope all such changes and modifications as are within the 
true spirit and scope of this invention. Moreover, while the 
invention has been particularly shown and described with 
reference to these specific embodiments, it will be 
understood by those skilled in the art that the foregoing and 
other changes in the form and details may be made therein 
without departing from the spirit or scope of the invention. 
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